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1  Introduction 


This  article  deals  with  an  empirical  Bayes  modeling  approach  (by  which  is  meant 
latent  ability  random  sampling  in  the  IRT  context)  to  the  item  response  theory  (IRT) 
modeling  of  psychological  tests.  Suppose  we  randomly  sample  N  persons  from  a 
specified  population,  and  then  administer  a  test  consisting  of  n  items.  The  data 
structure  for  a  randomly  selected  examinee  can  be  expressed  by  a  random  vector 


(XU...,Xn,0), 


where  Ari,...,Arn  denote  item  responses  and  6  denotes  examinee  ability,  which  is 
unobservable.  Abstractly,  in  an  empirical  Bayes  problem  the  data  is  modeled  by 
independent  identically  distributed  (i.i.d.)  random  vectors 


. (Af> . X<!>, 


'(2) 


(N) 


One  important  measurement  goal  is  the  estimation/prediction  of  each  examinee’s  6. 
Clearly  one  should  use  the  first  examinee  response  Aj1*, ...,  to  predict  the  actual 
value  of  0i.  However,  unless  the  distribution  of  0  is  completly  specified,  there  is  useful 
information  in 


(■ x f’ . X«>),  X«3>), .... (A-r1, .... 


the  second  through  Nth  examinee  responses,  about  the  unknown  distribution  of  6  and 
thus  about  the  unknown  ability  0X  in  particular,  which  we  want  to  estimate.  Thus  an 
alternative  approach  to  using  only  (Xf1*, ...,  X^)  is  to  use  all  of  the  test  responses  in 
making  inferenses  about  0X. 


Let  Xj  be  the  score  for  a  randomly  selected  examinee  on  the  jth  item;  Xj  =  1  if 
the  answer  is  correct,  Xj  —  0  if  in  correct,  and  let 


{1  with  probability  Pj(0) 

0  with  probability  1  —  Pj(0) 
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where  Pj(0)  denotes  the  probability  of  correct  response  for  a  randomly  chosen  exam¬ 
inee  of  ability  0,  that  is, 

Ptf)  =  P{*i  =  1|*}, 

where  0  is  unknown  and  has  the  domain  (—00,  00)  or  some  subinterval  on  (—00,00). 
We  make  two  assumptions  about  the  IRT  models  of  this  paper: 

(a)  Local  Independence  (also  called  Conditional  Independence) 

p.(  *1,. ..,*.!»)  =  />{(x,,...,xn)  =  (i„...,in)l«} 

=  n  fix, = x#} 

j= 1 

=  nwii-wr*. 

1-1 

(b)  Monotonicity:  each  Pj(0)  is  strictly  increasing  in  0. 

Lord  ( 1980)  makes  an  interesting  remark  about  the  existence  of  a  prior  distribution 
for  ability: 

“In  work  with  published  tests,  it  is  usual  to  test  similar  groups  of  ex¬ 
aminees  year  after  year  with  parallel  forms  of  the  same  test.  When  this 
happens,  we  can  form  a  good  picture  of  the  frequency  distribution  of  ability 
in  the  next  group  of  examinees  to  be  tested .” 

This  suggests  taking  an  empirical  Bayes  approach  to  IRT  modeling,  in  particular 
assuming  partial  knowledge  about  the  distribution  of  0  and  thereby  being  able  to 
make  efficient  use  of  the  response  data  to  make  inferences  about  the  distribution  of  0 
and  thus  make  inferences  about  the  unobservable  examinee  abilities.  The  distribution 
of  a  test  response  X\, . . . ,  Xn  is  indexed  by  0,  which  belongs  to  the  parameter  space 
©;  that  is,  each  0  €  ©  governs  a  test  response  distribution.  Let  Ln(0)  denote  the 
log-likelihood,  that  is 

Ln(0)  =  log{Pn(Xu...,Xn\O)}. 
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If  we  assume  that  the  prior  distribution  has  density  11(0),  according  to  Bayes’  theorem, 
the  posterior  density  for  each  given 

(  •  •  •  » -^n)  =  (  X\,...,X  n) 


can  be  written  as 


where 


Hn(0|  3Jl,...,2^n) 


pn(  i1,...,xn|0)n(0) 
Pn(  ®lv)  Xji'j 
exp{Ln(0)}II(0) 

P n(  X\ ,  .  .  .  ,  Xn) 


Pfi  (  Xi ,  .  .  .  ,  Xn 


Pn(  xu...,xn\9)U(6)d$. 


(1) 


Notice  that,  the  “prior”  and  “posterior”  refer  to  the  relationship  between  the 
distributions  and  the  observation  xj, . . . ,  x„.  E.g.,  11(0)  is  prior  to  Xi, . . . ,  xn  and 


n„(0|  Xj, . . .  ,xn) 


is  posterior  to  xi,...,xn.  These  ideas  can  be  easily  extended  to  the  study  of  the 
asymptotic  behaviour  of  the  posterior  distribution.  In  particular,  for  each  x\, . . . ,  xn, 
what  can  be  said  about  the  posterior  probability  of  0  as  n  tends  to  infinity? 


It  has  long  been  part  of  the  IRT  folklore  that  under  the  usual  empirical  Bayes 
unidimensional  IRT  modeling  approach,  the  posterior  distribution  of  0  given  test 
response  is  approximately  normal  for  a  long  test.  Holland  (1990)  indicates: 

“At  present  I  know  of  no  through  discussion  of  the  asymptotic  posterior 
normality  of  latent  variable  distributions  and  this  would  appear  to  be  an 
interesting  area  for  further  research.  ” 

In  classical  statistics,  when  (  Xi, . . . ,  Xn)  are  i.i.d.,  an  important  result  (informally 
stated)  is  that,  for  n  large,  the  posterior  density  II„(0|  Xi, . . .  ,Xn)  is  approximately 
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equal  to  the  normal  density  N(0n,a*),  where  0n  is  the  maximum-likelihood  estimator 
(or  MLE)  of  0  and  =f  {— L"n(0n))  ,  where  L"n(0n)  is  the  second  derivative  with 

respect  to  0  of  the  log-likelihood  evaluated  at  0n.  0n  and  a\  here  are  functions  of 
(  X\,. .  .,Xn)  only.  Intuitively,  d*  — >  0  in  applications,  usually  like  1  jn. 


Lindley(1965)  proposed  a  heuristic  approach  to  prove  the  above  result  by  expand¬ 
ing  the  log-likelihood  in  Taylor  series  in  0  about  0n, 

Ln(0)  =  Ln(0n)  +  l~{0  -  0n?L"n{0n)  +  Rn, 

where  Rn  is  a  remainder  term.  Since  the  log-likelihood  has  a  maximum  at  0n  the  first 
derivative  vanishes  there.  As  shown  above  the  posterior  density  viewed  as  a  function 
of  0  for  fixed  Xj, . . . ,  xn  is  proportional  to 


n  (0)exp{Ln(0)}. 

Therefore, 

nn(0|  x,,...,x„)  oc  n (0)exp{Ln(0n)  -  {d~e2n)2  +Rn}. 

lCfn 

Since  Ln(0n)  does  not  involve  0 ,  it  may  be  absorbed  into  the  omitted  constant  of 
proportionality  so  that 

II„(01  xi,...,xn)  a  \[{0)exp{-~ +  Rn},  (2) 

where  the  remainder,  Rn,  is  claimed  to  be  negligible  when  compared  with  the  other 
term  in  (2).  Because  a\  — ►  0  like  1/n,  the  density  in  (2)  becomes  concentrated  at 
0n  in  the  limit,  thus  allowing  11(0)  to  also  be  absorbed  into  the  omitted  constant  of 
proportionality.  Thus, 

nn(0|  xi,...,xn)  a  exp{  —  —  } 

Lan 
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as  desired.  However,  Lindley  (1965)  did  not  give  a  rigorous  proof. 

Walker(1969)  proved  that  under  certain  conditions,  the  posterior  probability  of 
0n  +  aan  <  0  <  0n  +  fe< rn,  namely 

f6n+  bon 

I  Un(0\Xl,...,Xn)d0, 

J6n+adn 

converges  in  probability  Pg0  to 

(27t)_1/2  f  e~?y3 dy 

Ja 

as  n  — ►  oo.  Here,  as  the  notation  Pg0  indicates,  in  the  generation  of  X\, . . .  ,Xn 
we  assume  Oo  is  the  true  value  of  6.  That  is  X\,...,Xn  is  generated  according  to 
the  distribution  Pn(  xx, . . . ,  £n|0o)-  Then,  using  the  rules  of  conditional  probability 
computation,  it  is  easy  to  show  that  one  way  to  interpret  Walker’s  result  is  that 

P[0n  +  acrn  <0o<0n  +  fe<rn|  Xu..., Xn, 0O] 

converges  in  probability  to 

(27r)_1/2  f  e~%yi dy 

J  a 

as  n  — »  oo.  That  is,  for  each  fixed  (but  unknown)  9q  we  have  an  asymptotic  confi¬ 
dence  interval  for  each  choice  of  a  <  b. 

As  we  know,  for  all  realistic  applications,  the  item  characteristic  curves  are  not 
identical.  Therefore,  the  {Aj}  we  have  are  merely  independent,  conditional  on  0,  but 
not  identically  distributed.  However,  the  general  IRT  model  enables  us  to  prove,  by 
adapting  the  approach  that  Walker  (1969)  applied  to  i.i.d.  random  variables, 

(a)  The  “weak”  convergence,  that  is,  for  — oo  <  a  <  b  <  oo, 

An=  /  un(0\xu...,xn)de 

J0n+a&„ 
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converges  in  probability  Pg0  to 

A  =  (2tt)-1/2  f  e~^y2dy 

Ja 

as  n  — >  oo.  That  is, 

■PfloiMn  —  A|  <  e}  — ►  1,  as  n  — ►  oo,  /or  arbitrary  t  >  0. 

(b)  The  strong  convergence  of  ,An:  that  is, 

Pg0{  lim  An  =  A)  =  1; 

vn— *oo 

(c)  Convergence  in  “manifest”  probability,  or  “Oo  free”  convergence,  that  is,  An  con¬ 

verges  to  A  in  the  manifest  (or  marginal  in  the  sense  that  0o  is  integrated  out) 
probability  P,  which  is  defined,  for  any  fixed  n 

P{{Xu...,Xn)  = 

=  [  P„(  xu...,xn\0)TT(0)d0. 

Je 

This  result  is  also  easily  interpretable  as  an  asymptotic  confidence  inteval  for 
ability.  That  is,  it  assures  that 

P{0n  +  acrn  <0  <0n  +  ban\  Xu..  .,Xn} 

converges  in  probability  to 

(27t)_1/2  [  e~%yl dy 

Ja 

as  n  — »  oo.  That  is,  for  any  randomly  sampled  examinee,  we  have  an  asymptotic 
confidence  inteval  for  each  choice  of  a  <  b.  Here  in  (c),  in  contrast  to  (a),  the 
value  of  0  for  the  randomly  sampled  examinee  is  not  fixed. 

(d)  The  weak  and  strong  consistency  of  the  MLE  0n ,  which  are  intermediate  results 

in  the  proofs  of  (a)  and  (b). 

Proving  (a)-(c)  is  the  main  purpose  of  this  paper,  thereby  meeting  the  Holland 
challenge  quoted  above. 
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2  Further  Notation  and  Assumptions 

2.1  Basic  Notation 


0o’.  The  true  parameter.  In  saying  that  Xj  is  a  random  variable  we  infer  that  Xj  has 
the  density 

wni -Pi w]1"',  xj = o, i, 

for  some  fixed  value  of  8.  Denote  this  value  by  90,  which  we  refer  to  as  the  true 
parameter. 

0n :  The  Maximum  Likelihood  Estimator(MLE)  of  0,  which  is  defined  as  a  solution 
(in  general  non-unique),  of 

Pn(  Xu . . . ,  Xn|0„)  =  max{P»(  *»!*)},  (3) 

if  it  exists,  or  equivalently,  of 

Ln0n)  =  max{Ln(0)}.  (4) 


Ij(8).  The  item  information  function  of  item  j,  which  is  equal  to 

Al  Psm-PM' 

where  Pj{8)  is  the  first  derivative  of  Pj(8)  with  respect  to  6. 


/(”)(#):  The  test  information  function 

/<-’(«)  =  £  10). 

J=1 


=  U,n,(».)}"\  (5) 

noting  that  our  definition  of  a *  used  hereafter  in  the  paper  differs  from  the  often 
used  =f  {  —  L'n(On)}  1  mentioned  above. 
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A j(6):  The  logit  function  of  item  j 


Z:(0): 


*i(*)  =  log{ 


PM  , 
1-  Pj(oy 


(6) 


*;(*)  =  M 


(7) 


2.2  Regularity  Conditions 

Some  “regularity”  conditions  and  their  explanations  will  be  stated  before  going 
into  details  about  our  theorems.  Fix  do  €  0:  There  are  five  basic  assumptions: 


(Al):  Let  0  G  0,  where  0  is  (—00,00)  or  a  bounded  or  unbounded  interval  in 
(—00,00).  Let  the  prior  density  11(0)  be  continuous  and  positive  at  0O,  where 
Oq  is  assumed  be  the  true  value  of  0. 

(A2):  Pj{9)  is  twice  continuously  differentiable  and  Pj{9)  and  Pj(0)  are  bounded  in 
absolute  value  uniformly  with  respect  to  both  0  and  j  in  some  closed  interval 
Nq  of  0q  £  0. 


(A3):  For  every  fixed  6  ^  0Ol  assume  for  some  given  c(0)  >  0 

Timn-'fXZ#)  <  -c(S)  (8) 

j=i 

and 

sup  |Aj(0)|  <  00. 
j 

(See  Footnote1.)  Note  that 

urn-  £.(«»)  =  E  (9) 

>= 1 

*For  a  sequence  of  real  number  (a„),  if  limn_0O  a„  does  not  exist,  then  {an}  must  have  more 
than  one  limit  point.  /imn-Kan  denotes  the  largest  limit  point  (or  upper  limit). 
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(A4):  {/j(0)}  and  {A^(0)}  and  {A^”(0)}  are  bounded  in  absolute  value  uniformly  in 
j  and  in  0  6  No,  No  specified  in  (A2)  above. 


(AS): 


liminf  ;<n>^°*  >  c(0o)  >  0. 


n 


That  is,  asymptotically,  the  average  information  at  0q  is  bounded  away  from  0. 

Although  0  may  be  (  —  00, 00),  we  always  assume  without  loss  of  generanality  that 
0o  is  contained  in  a  finite  interval,  e.g.  [—a,  a]  for  some  fixed  a  >  0.  This  is  because 
from  the  psychometric  viewpoint,  taking  var(0)  =1  for  convenience,  the  same  edu¬ 
cational  decision  is  made  about  people  with  0  =  4  and  people  with  9  =  24.  Thus, 
assuming  —5  <  9  <  5  does  no  practical  damage. 


The  condition  (8)  of  assumption  (A3),  perhaps,  looks  unfamiliar.  But  it  plays 
an  important  role  in  the  proof  of  Lemma  3.1  below,  ensuring  the  identifiability  of 
0O.  That  is,  when  90  is  the  true  value  of  0,  E{Ln(0)  —  Ln(9o)}  should  be  sufficiently 
negative  for  all  values  of  0  ^  0O  .  In  other  words,  this  condition  allows  us  to  “identify” 
0o  by  maximizing  the  likelihood  function.  (A3)  acts  as  a  remedy  in  the  case  that  {Xj} 
are  merely  independent  but  not  identically  distributed.  In  other  words,  if  they  are 
i.i.d.,  as  is  the  case  in  Walker’s  proof,  then  (A3)  is  automatically  satisfied.  To  see 
this,  note  in  the  i.i.d.  case  that 


n-’££«b<Z>W}  =  -£UZ.W>- 

]= 1 


Note  that 


E,,tZp{z,m  =  p,(e o)  AM  +  (1  _  p^L-M. L  =  1. 

Thus,  since  —logx  is  strictly  convex,  Jensen’s  inequality  (Lehmann,  p50)  shows  that 
for  arbitrary  9 


EeoZx(0)  =  Ee0[log{Y(9)}]  <  log{Ee0[Y{9 )]}  =  0, 


(10) 
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where 


Y(0)  =  exp{Z1(0)}. 


Thus  (8)  is  satisfied  by  taking 

c(0)  =  -Ego{Zl(0)}. 

Unfortunately  {Zj(0)}  in  IRT  models  are  not  identically  distributed,  so  we  have  to 
impose  some  supplementary  condition.  According  to  (10),  n-1  £"=1  EgoZj(0)  will  be 
negative,  however,  this  does  not  enable  us  to  obtain  (8).  For  what,  classes  of  IRT 
models  then  does  (8)  hold?  Consider  the  case  in  which  each  EeoZj(0)  satisfies,  for 
some  c(0), 

EeoZj{0)  <  —c(0)  <  0.  (11) 

It  is  obvious  that  (8)  holds.  However,  this  condition  is  stronger  than  needed.  It  would 
suffice  to  merely  require  that  a  “certain  proportion”  of  the  EgoZj(0) s  satisfy 
condition  (11),  say  one  in  every  K,  no  matter  how  large  the  K  is.  Mathematically 
speaking,  this  would  imply 

EEo'ZjW  <  =  -c(«)  <  0, 

j=i  A  n 

and  so 

Tunn'1  £  EOoZj(0)  <  -c(0)  <  0. 
n— *°°  j=i 

Actually,  (8)  does  not  seem  very  restrictive  in  IRT  models  incurred  in  practice.  As 
evidence,  consider  a  “typical”  IRT  model  of  40  3PL  items,  in  which  the  item  parame¬ 
ters  are  precalibrated  from  a  real  ACT  math  test.  The  graphs  illustrated  in  Figure  1 
are  the  EgoZj(0) s  computed  from  this  model.  Clearly  (8)  seems  to  be  holding. 


(A4)  and  (A5)  are  used  to  make  T”(0)  behave  sufficiently  well  for  0  near  0O.  Con¬ 
dition  (A5)  implies  that  the  test  information  function  evaluated  at  0o  tends  to  infinity 
with  the  same  speed  as  n.  These  five  conditions  would  not  be  difficult  to  verify  in 
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Figure  1:  £«0{Zi(tf)}s  for  40  items,  ACT-MATH  Test  (Drasgow,  1987). 
particular  applications  and  hence  are  really  fairly  mild  modeling  assumptions. 


3  The  Main  Theorems 

In  this  section  we  will  introduce  three  theorems  and  the  major  steps  of  the  proof 
of  Theorem  3.1,  the  basic  theorem.  The  rigorous  proofs  of  these  theorems,  as  well  as 
their  related  lemmas  and  corollaries,  are  contained  in  an  appendix. 

3.1  Convergence  in  Probability 

Theorem  3.1  Suppose  that  conditions  (Al)  through  (A5)  hold.  Let  0n  be  an  MLE 
of  0Q,  and  Sn  be  the  square  root  of  {/(n>(0 „)}-*.  Then,  for  -oo  <  a  <  b  <  oo,  the 
posterior  probability  of  0n  +  aon  <  0  <  0n  +  ban,  namely 


tends  in  P$  to 


as  n  — *  oo. 


(2t t)-1'2  £  e-tu'du, 


Theorem  3.1  is  the  basic  result  in  our  asymptotic  posterior  normality  work.  Note 
that  An  is  a  random  variable  depending  on  X\, . . . ,  Xn.  Thus  its  distribution  is 
determined  by  the  parameter  80  and  An  — *  A  in  P$0  means 

Jim^Pe0{|/ln  —  A\  <  e}  =  1,  for  arbitrary  e  >  0. 

Outline  of  Proof.  To  prove  the  theorem,  write 

ren+bd  „  q 

_  Un(8\Xl,...,Xn)dd  =  U 

j  Vn  A- ad n  )  •  •  •  »  ^nj 

=  ■_  g  (  pn(xu...,xn)  y 

Pn(  Xu...,xn\ en)an  \pn(  Xu...,xn\en)aj 

where 

G  =  [  n(0)Pn(Xi,...,Xn\6)dd,  (12) 

J6n+aon 


pn(  xu ■ . •  ,xn)  =  J n (8)pn(  xu.. .,xn\o)dd. 


It  suffices  to  prove 


Pn{Xu...,Xn) 
Pn(Xu.:,Xn \8n)an 


(2ir)''2n(0„) 


as  n  — ♦  oo,  in  Pg0 ,  and 


Pn(Xu...,Xn\0n)crn 


(2*)1/2I1(80 ){*(a)  -  *(6)} 


as  n  — ♦  oo,  in  Pe0,  where  $(x)  =  (2ir)~1^2  /f^  e  ^ du. 
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In  the  following  we  will  present  the  general  idea  to  prove  (13).  ((14)  is  proved  by 
the  similar  method.)  First  expand  Ln{0)  at  0n  by  Taylor  expansion:  we  have 

M«)  -  M«.)  =  {6  ~2e’)7  CK) 

=  -  ft.),  (15) 

where  0*  is  a  point  between  6  and  9n,  and  <72  is  defined  by  (5)  and  Rn  is  defined  by: 


R„  d='  X, . X.)  =  l+ilft) 


Split  Pn(  X\, . . . ,  Xn)  into  two  parts  as  follows 

Pn(Xu...,Xn)  =  /  Yl(0)Pn(Xu. 

J\6—0q 

+  /  n  (0)pn(xu. 

J\8—8o\<6 


l\8-80\<6 

= =  G\  +  G2. 


Therefore,  recalling  that  Ln(9 )  =  logPn(  X\, . . . , Xn\9), 
G\ 


-,Xn\9)dO 
•  ,Xn\9)d9 


Pn{  Xx,...,Xn\9n)an 


and,  using  (15), 

G 


G2  _  nw  f  nm  ( 

*n|*n)*n  *n  U-8o\<sU{9o)  ?{  2<7* 


(16) 


(17) 


=  exp{Ln(9 o)  -  T„(<?n)}{/(n)(^)}1/2 
x  /  II(0)exp{Ln(0)  -  L„(0O)}<*0  (18) 

J0— 0q|>6 


Thus,  if 


and 


Gi 


Pn(  XU...,Xn\0n)&n 
G2 


0  in  Pt 


8o 


Pn(  XU...,Xn\0n)dn 


(27r)1/2n(0o)  inPe 0, 


(19) 


(20) 


(21) 
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then  (13)  holds.  For  establishing  (20),  first  consider  (18):  If  0n  is  consistent  then 
exp{Ln(0o)  —  Ln{6n)}  goes  to  a  constant  as  n  approaches  oo.  On  the  other  hand,  since 
{/(n)(<?n)}a/2  approaches  oo  like  n1/2,  we  need  to  make  Ln(8)  —  Ln(0o)  “sufficiently 
negative”  so  that  the  integral  of  (18)  approaches  0  faster  than  n-1^2  and  hence  the 
left  hand  side  of  (20)  can  be  neglected  outside  the  6  region  of  60.  As  for  establishing 
(21),  consider  (19):  Since  11(0)  is  continous,  II(0)/n(0o)  will  be  close  to  one  for  6 
sufficiently  small,  and  we  need  to  make  Rn  “sufficiently  small”  inside  the  6  region 
so  that  we  can  estimate  the  integral  by 


Mathematically  speaking,  we  need  the  following  two  lemmas. 

Lemma  3.1  Suppose  that  conditions  (Al)  through  (A3)  hold.  For  any  6  >  0,  there 
exists  k(S)  >  0  such  that 

lim  P6o{  sup  n~l[Ln{6)  -  L„(0O)]  <  -*(<*)}  =  1. 

|0-0O|>0 

Lemma  3.2  Suppose  that  conditions  (Al)  through  (A5)  hold.  Then 

-£.(»»)  =  («-«„)2I>;)/2  =  (22) 

where  0*  is  a  point  between  0  and  0„,  and  Rn  is  defined  by  (16).  Also,  for  any  e  >  0, 
there  exists  6  such  that 

lim  P{  sup  \Rn{0,  Xi,...,X„)|  <  e}  =  1.  (23) 

n_>0°  |0-0O  I  <0 

As  a  by-product,  Lemma  3.1  ensures  the  consistency  of  the  MLE  0n,  which  is 
labeled  as  Corollary  3.1. 

Corollary  3.1  Suppose  that  conditions  (Al)  through  (A3)  hold.  Than  0n  is  weakly 
consistent,  namely 

lim  0„  =  0O  in  P6o.  (24) 

n— *  oo 
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It  can  be  shown  that  (22)  of  Lemma  3.2  makes  it  possible  for  us  to  use  the 
reciprocal  of  the  test  information  as  the  variance  estimate  (see  (5)),  instead  of 

*1  =  {-C(on)}~\ 

as  Lindley  (1965)  and  Walker  (1969)  each  suggested.  The  variance  estimate  (5)  we 
have  chosen  has  the  following  advantages: 

•  The  information  function  /*”)(  )  is  always  positive.  —  L"n{  ),  by  contrast,  could 
be  negative,  especially  when  the  sample  size  is  not  large  enough.  So,  some  times 
{— £"(  )}^2  may  not  exist. 

•  The  information  function  is  easier  to  calculate,  while  the  calculation  of  L  "n{  )  is 
more  complicated. 

Future  study  should  be  undertaken  to  compare  the  speed  of  the  convergence  and  to 
explore  any  further  advantages. 

3.2  Convergence  Almost  Surely 

As  discussed  in  the  preceding  subsection,  the  posterior  distribution  for  de¬ 
rived  from  a  proper  prior  density  11(0),  converges  in  probability  to  the  standard 
normal  distribution.  In  this  subsection  we  will  see  that  a  stronger  result,  conver¬ 
gence  almost  surely,  (also  referred  to  as  strong,  almost  everywhere,  or  with 
probability  one  convergence),  can  be  achieved  under  the  same  assumptions. 

Theorem  3.2  Suppose  that  conditions  (Al)  through  (A5)  hold.  Let  0n  be  an  MLE 
of  60,  and  bn  be  the  square  root  of  {/*"l(0n)}-1.  Then,  for  — oo  <  a  <  b  <  oo,  the 
posterior  probability  of  6n  +  acrn  <  6  <  6n  +  bcrn,  namely 

r6n+bS„ 

An=  /  n„(0|  Xn)d0, 

J6„+  ad„ 
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tends  to 


as  n  — *  oo. 


A  =  (2tt)  /  e" 

Ja 


du 


almost  surely , 


What  is  the  difference  between  the  conclusions  of  Theorem  3.1  and  Theorem  3.2? 
It  is  instructive  to  look  at  the  following  two  statements  which  are  equivalent  to  these 
two  theorems  respectively: 

•  The  sequence  {/!„}  is  said  to  converge  in  probability  Pg0  to  A  if  and  only  if  for 
each  t  >  0, 

lim  Pg0{\An  -  A\  >  e}  =  0, 

n— *oo  u  1  3 

or  equivalently 

nUm  Pe0{\An  -  A\  <  c}  =  1.  (25) 

•  The  sequence  {v4„}  is  said  to  converge  to  A  almost  surely  (or  in  probability  one, 
strongly,  almost  everywhere,  etc.)  if  and  only  if,  for  each  e  >  0, 

lim  Pg  {max \Am  —  A\  <  e}  =  1.  (26) 

Since  (26)  clearly  implies  (25),  we  have  the  immediate  conclusion  that  Theorem  3.2 
implies  Theorem  3.1. 

In  order  to  have  a  better  understanding  about  convergence  almost  surely,  it 
is  interesting  to  quote  the  following  example  by  Stout  (1974,  p9): 

“In  statistics  there  are  certain  situations  where  almost  sure  conver¬ 
gence  seems  a  more  relevant  concept  than  convergence  in  probability.  Con¬ 
sider  a  physician  who  treats  patients  with  a  drug  having  the  same  unknown 
cure  probability  of  p  for  each  patient.  The  physician  is  willing  to  continue 
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use  of  the  drug  as  long  as  no  superior  drug  is  found.  Along  with  admin¬ 
istering  the  drug,  he  estimates  the  cure  probability  from  time  to  time  by 
dividing  the  number  of  cures  up  to  that  point  in  time  by  the  number  of 
patients  treated.  If  n  is  the  number  of  patients  treated,  denote  this  esti¬ 
mating  random  variable  by  X(n)-  Suppose  the  physician  wishes  to  estimate 
p  within  a  prescribed  tolerance  t  >  0.  He  asks  whether  he  will  ever  reach  a 
point  in  time  such  that  with  high  probability,  all  subsequent  estimates  will 
fall  within  t  of  p.  That  is,  he  wonders  for  prescribed  6  >  0  whether  there 
exists  an  integer  N  such  that 

P{max  |X(n)  -  p|  <  e}  >  1  -  8. 

The  weak  law  of  large  numbers  says  only  that 

P{|X(n)  —  p|  <  e)  — ►  1  as  n  — ►  oo 

and  hence  does  not  answer  his  question.  It  is  only  by  the  strong  law  of 
large  numbers  that  the  existence  of  such  an  N  is  indeed  guaranteed.  ” 

3.3  Convergence  in  Manifest  Probability 

Perhaps  it  may  seem  confusing  to  some  readers  to  simultaneously  have  6  fixed 
at  6q  and  have  9  be  a  random  variable  governed  by  11(0),  as  is  the  case  in  Theorems 
3.1  and  3.2.  Thus  some  sort  of  clarification  seems  needed.  The  idea  that  leads  to  the 
adoption  of  the  notation  90  is  the  following:  For  any  given  response  vector 

(  Xj ,  •  •  -  ,  Xn )  =  (  Xi,...,Xn), 

if  it  comes  from  a  randomly  selected  examinee  we  can  always  assume  that  he  or  she 
has  specific  ability  ,  say  60.  However,  in  most  cases  90  is  unknown  but  hypothetically 
specified.  Under  this  assumption,  the  distribution  of  X\, . . .  ,Xn  is  induced  by  0q. 
On  the  other  hand,  the  given  xj,...,xn  can  also  be  interpreted  just  as  a  pattern. 
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Our  interest  is  to  know  the  proportion  of  examinees  in  the  population  who  would 
produce  response  vector  xj, . . . ,  x„.  Denote  this  proportion  number  as 

P{{Xu...,Xn)  =  {xu...,xn))  (27) 

and  call  it  the  manifest  probability.  It  is  clearly  that 

P{(X1,...,Xn)  =  (x1,...,xn)}>0 


£  P{(  X1,...,Xn)  =  (x1,...,Xn)}  =  1. 

Xu...,Xn 

Since  we  know  the  prior  density  11(0),  (27)  can  be  obtained  by  integrating  the  joint 
probability  with  respect  to  0,  that  is 

P{(  Xu...,Xn)  =  (xu...,xn)}=  f  Pn(  x1,...,x„|0)n(0)d0. 

Je 

According  to  Theorem  3.1, 

[en+ban  n„(0|  X,, . . . , Xn)dO  -  *(a)  -  $(6)  (28) 

Jen+ao„ 

in  probability  Pe0 ■  It  is  very  interesting  to  notice  that  the  right  hand  side  of  (28)  is 
free  of  0O,  which  suggests  that  we  can  further  prove  that  the  convergence  is  “free  of 
00”.  Since  (28)  holds  for  “every”  0O,  intuitively  speaking,  it  should  be  true  that  (28) 
holds  under  the  “average  of  0os”.  Therefore,  we  ought  to  be  able  to  substitute  the 
manifest  probability  P  for  P$0 : 

Theorem  3.3  Suppose  that  conditions  (Al)  through  (A5)  hold.  Let  0„  be  defined  by 
(3)  or  (4),  and  an  be  the  square  root  of  {/^nl(0n)}_1 .  Then,  for  — oo  <  a  <  b  <  oo, 
the  posterior  probability  of  0n  +  acrn  <  0  <  0„  +  bcrn,  namely 

r9n+ba„ 


tends  to 


r»n+OOn 

I  n„(0|  xlt...,xn)do, 

J0„+a6n 

(2i)-1/J  /‘e-i ''du 

Ja 
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in  manifest  probability  P. 


Summarizing  the  last  few  paragraphs,  Theorem  3.1  implies  that  the  asymptotic 
posterior  normality  holds  for  any  randomly  chosen  examinee  with  ability  60.  On 
the  other  hand,  Theorem  3.3  ensures  that  this  asymptotic  property  holds  for  any 
randomly  sampled  examinee  from  the  population.  In  other  words,  one  is  sampled  from 
the  subpopulation  and  the  other  is  sampled  from  the  whole  population.  Therefore, 
Theorem  3.3  has  more  general  meaning.  ( The  original  idea  of  Theorem  3.3  was 
proposed  by  Brian  Junker  in  personal  conversation  with  one  of  the  authors.) 

4  Conclusions 

The  asymptotic  posterior  normality  of  latent  variable  distributions  has  been  es¬ 
tablished  under  very  general  and  appropriate  hypotheses.  This  result  has  (at  least) 
two  important  implications.  First,  it  provides  a  probabilistic  basis  for  assessing  ability 
estimation  accuracy  in  the  long  test  case.  Second,  it  provides  an  important  first  step 
in  making  rigorous  the  Dutch  Identity  conjecture  (Holland,  1990),  which,  roughly 
speaking,  claims  that  only  2  parameters  per  item  are  required  in  order  to  obtain  good 
long  test  model  fit  for  unidimensional  test  data. 

Further,  the  consistency  of  MLE  of  9  has  been  discussed.  It  is  very  interesting 
to  mention  that  our  proof  of  the  consistency  of  the  9n  is  very  similar  to  the  Wald’s 
proof(1949)  for  the  Xi,...,Xn  i.i.d.  case.  It  is  worth  remarking  that  the  general 
IRT  model  (that  is,  non  identically  distributed  responses)  yields  as  powerful  asymp¬ 
totic  results  as  the  i.i.d.  model  -  the  favorite  model  of  most  statisticians,  which  has 
so  many  good  qualities. 


19 


Finally  we  should  indicate  that  for  general  multidimensional  IRT  models  the 
asymptotic  posterior  normality  can  be  proved  for  the  random  vector  9  given  test 
response  X\, . . . ,  Xn,  under  suitable  regularity  conditions. 
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Appendix:  Proofs  of  Main  Theorems 

In  this  appendix  we  will  prove  the  results  introduced  in  Section  3. 


A  The  Proof  of  Convergence  in  Probability 

The  proof  of  Theorem  3.1  is  based  on  Lemma  3.1,  Lemma  3.2,  and  Corollary 
3.1.  Before  going  to  the  proofs  ,  two  important  theorems,  from  real  analysis  and 
probability  theory  respectively,  should  be  introduced  here: 

Theorem  A.l  (Heine-Borel  covering  theorem)  (Billingsley,  p566) 

If[a,b)  C  n^ij(ajt,  bk),  then  [a,  b]  C  n£_j(a*,bjt)  for  some  n. 

Remark:  Equivalent  to  the  above  theorem  is  the  assertion  that  a  bounded,  closed  set 
is  compact!2. 

Theorem  A.2  (Strong  law  of  large  number  (Strfling,  p27)) 

Let  be  independent  with  means  p\,  P2>  ■•■and  variances  cr2 ,  <r22  ,••••  If  the 

series  Yl'jLi  a]l  P  converges,  then 

n  n 

n-1  ^2  Xj  —  n-1  ^2  Vi  ~ y  0  with  probability  one. 

j=i  j=i 

Proof  of  Lemma  3.1: 

Remark.  The  proof  of  Lemma  3.1  is  an  improvement  over  Walker’s  result,  which  only 
covers  the  i.i.d.  case.  The  strategy  used  in  the  proof  can  be  described  by  two  steps: 

(a)  to  prove,  for  any  ^  $o>  there  exists  8i  >  0  such  that 

lim  P6o{  sup  n~'[Ln{6)  -  Ln(60)\  <  -c,(<5,)}  =  1. 

n-°°  |0-0,|<5, 

We  put  the  subscript  i  here  because  we  only  need  finite  number  of  such  OiS. 

2  A  set  C  is  defined  to  be  compact  if  each  cover  of  it  by  open  sets  has  a  finite  subcover  -  that  is, 
if  [ G)  :  0  €  0]  covers  C  and  each  Gg  is  open,  then  some  finite  subcollection  {Ggt ,  covers  C . 
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(b)  to  use  Theorem  A.l  to  cover  {|0  -  0O|  >  n  C  ,  where  C  is  a  compact  set,  by  a 
finite  number  of  open  sets  \0  —  0, ;|  <  Si,  i=l,..,m. 

For  any  0  ^  0o,  recalling  from  (7),  the  definition  of  Zj(0),  and  (9),  it  follows  that 
n-'[Ln(9)  ~  Ln(0o )]  =  n-1  £  H<>)-  (29) 

i= i 

Now,  from  (7), 

E,aZi(e)  =  p,(e 0)]og(^^}  +  (30) 

In  order  to  apply  Theorem  A. 2  to  {Zj(0)},  we  need  to  estimate  var(Zj(0)).  Writ¬ 
ing  Zj(0)  using  logit  function  (see  (6)), 

Z,m  =  XilXiW  -  +  log)  I 

it  follows  that 

var (Zj(fl))  =  „ar(X,)M<>)  -  A;(6o))! 

=  Pj(«o)(  1  -  -  -M0o)]2. 

Since,  for  any  fixed  0,  Aj(0)  is  bounded  in  absolute  value  uniformly  in  j  (assumption 
(A3)),  this  implies  that  there  exists  a  constant  0  <  M(0)  <  oo  such  that 

\var(Zj{0))\  <  M{0)  for  all  j , 


(31) 

(32) 

(33) 
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for  some  c(0)  >  0. 


Suppose  N0  is  the  closed  interval  assumed  in  condition  (A2).  For  any  fixed  O'  £ 
N0  C  0  and  for  any  0  satisfying  \0  —  0'\  <  6,  define  Hj(0‘ ,0)  by  the  following: 


H:(0\0)  =  \  log-^|  +  ilog 


1  -  Pj(0) 

1  -  P3(0') 


Since  Pj(0)  is  strictly  increasing  in  0,  Pj(O')  =  1  and  Pj(O')  =  0  can  be  ruled  out. 
Hj(0 , 0),  as  a  continuous  function  of  0 ,  will  achieve  a  maximum  value  over  [O'  —  <5,  6 '  + 
6].  Denote  this  maximum  value  as  H}{8, 6'),  that  is,  there  exists  O1-6  ,J’6)  £  [O'  —  6,  O'  +6] 
such  that 

H}(6,0')  =  H}{0{e'‘j’s\0')=  max  {Hj(0,0)}.  (34) 

\e-e'\<s 

Clearly,  for  each  j 

Jim  tfi(M')  =  0. 

Now  we  have 


<  ib»Ol  +  "og{r^,i  (35) 

=  //,(#>)  <  H,(6,6‘)  (36) 

We  shall  now  prove  that  {Pj(0)}  is  equicontinuous3.  From  (A2),  Pj(0)  is  continuous 
and  bounded  in  absolute  value  uniformly  in  j  and  in  0  £  Nq.  By  the  mean  /alue 
theorem, 

I Pj(0)  -  Pjtf) I  =  I p-mo  -  Ol  <  Cp\0  -  0'\  for  all  j ,  (37) 

3A  function  P  defined  on  (-00,00)  is  said  to  be  equicontinuous  if,  given  e  >  0,  there  exists  a 
number  6  >  0  such  that  |x  —  x  |  <  6  implies  \P(x ’)  —  P(x ”)|  <  f  for  all  x  ,  x". 
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where  ( j  is  a  point  between  0  and  O'  for  each  7,  and  (p  =supJ-{|.P-(Cj)|}  which  is  finite. 
Let  6  =  t/C,p  for  t  >  0,  then 


if\0-0'\  <  8,  | <  e  for  all  j. 


Recall  that  O'  here  is  any  fixed  point  in  No-  Note  that 

Hj{6,0')  <  max  {|log^^|}  +  max  {|  log  y - jTufzl}- 

ee[e  -6,e  +6)  Pj{0  )  8e[e  -6,0  +s\  1  —  Pj(0  ) 

Since  Pj(0)  is  strictly  increasing  in  0  , 


fll  Pj(0)n  ^  ril  P,(0'-6),  M  PjiO'+S),, 
max  J  loSp7?i  -  mQX{l  log  ~n~W\  "  »  °S  pTflM  } 

6€[6‘ -6,0' +6]  Pj{0  )  Pj{0  )  Pj{0  ) 


and 


r„  1  -PjWn  ^  ( ■  1  1  -W  «),  ,,  l~Pj(0'+6)n 

£  mQj{|iog  j  I.  I*  D- 


Therefore, 


•tw'')  <  "-Ei'og^feri  +  "-Di°«f2s^ri 

i=i  j= 1  Pj\“ )  j= 1  P)\P ) 

+  n  £  I  X-P^Q')  l  +  n  ^|log  1  -Pj{0')  I* 

From  the  equicontinuity  of  {Pj(0)}»  for  arbitrary  e  >  0,  there  exist  a  sufficiently  small 
8  >  0  such  that 


Pj(0'  +  S). 


1 1  Pj(0  +  <5 ) .  c  ,  1 1  1  -  Pj(0  +  6  )  c 

1  °g  />(0')  1  <4  Qnd  1  °S  l-PjiO')  1  <  4’ 

where  either  S'  =  8  or  —8.  Thus,  for  all  n  and  for  all  8  sufficiently  small 


n  1  H  #j(M  )  <  e- 

j=\ 


Therefore 

Bmn-'JJ/jtM')  =  0  as  8  -»  0.  (38) 

n^°°  j=i 
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We  shall  now  prove  that  for  any  0,  ^  0O ,  there  exists  a  sufficiently  small  6,  >  0 
and  sufficiently  small  c,  >  0  such  that 

lim  P{  sup  n_1[Zn(0)  -  Z,n(0o)]  <  — c,-}  =  1.  (39) 

For  0  €  {0  :  \0  —  0,|  <  £},  according  to  (29), (7),  and  (36), 

n~x[Ln{6)  -  Ln{0o))  =  n~l[Ln{ei)  -  LM)  +  n~'[Ln(0)  -  Ln(0t)] 

<  n-'iL^Qi)  -  Ln{0 «)]  +  n-"tH:(6,6,). 

j=i 

So  we  have 

sup  n~'[Ln(0)  -  Ln(0 0)]  <  n~l[Ln(0t)  -  Ln(60)}  +  n"1  £  *)■ 

\8-6.\<6  ^ 

Substituting  0,  for  6  in  (33),  we  will  have 

P{ffinn-1  [/,„(#,)  -  Ln(0o)\  C  -c(0t)  =  -c,}  =  1,  (40) 

where  q  is  positive  for  all  i,  and  from  (38)  we  will  have  for  all  i 

n 

lirn^n-1  ^  Hj(6,0i)  — >  0  as  6  — >  0. 

"_,0°  j=i 

So  there  is  an  open  interval  \0  —  0,|  <  <5,  and  a  positive  number  c,,  e.g.  Ci  =  such 
that  (39)  holds. 

Recall  that  in  assumption  (Al)  0  can  be  defined  by  two  different  domains.  In  the 
following,  we  will  discuss  these  two  cases  respectively. 

Case  1:  If  0  is  a  bounded  closed  subset  of  (— oo,  oo),  then  0  —  {0  :  \0  —  #0|  <  is 
compact,  according  to  Theorem  A.l  it  can  be  covered  by  finitely  many,  say  m, 
such  open  intervals 

($1  —  ^1,01  +  6i),  (02  —  ^2,  $2  +  ^2)5  ••••,  ($m  —  &m,  +  <5m)- 
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Define  event  A-n*  by 


A,<n)  =  {  sup  n~l[Ln{9)  -  Ln(60)]  <  -c,} 


From  P{A|n^}  — ►  1  for  each  i  as  n  —*  oo,  we  have 


(41) 


^{n£i4n)}  -  1 


Now  we  replace  c,  in  (39)  with 


k(S)  =  min{ci,c2, . ,cm}. 

Therefore,  (39)  holding  for  all  i  implies  (24). 

Case  2:  If  0  is  not  bounded,  such  as  0  =  (— oo,  oo),  we  will  show 

lim  P{  sup  n~1[Ln{9)  —  L„(0o)]  <  — ca  <  0}  =  1 

”^°°  |0|>A 

for  a  sufficiently  large  positive  number  A.  Now 

0  _  {0  :  \e  -  0O|  <  6}  n  {0  :  |0|  >  A} 
is  bounded  compact  set,  so  finally  we  can  get  (24)  from  (42)  by  defining 


(42) 


k(6)  =  rnin{ci,c2,....,cm,CA}. 


To  complete  the  proof,  we  have  to  prove  that  (42)  is  correct.  Let  |#a|  =  A,  rewrite 


sup  n  1[Ln{0)  -  L„(0O)]  =  n  1[Ln(0^)  -  Ln(0 o)]  +  sup  n  1[Ln(0)-Ln(0 a)],  (43) 
|0|>A  |0|>a 


where 


-[Ln(0)  -  Ln(6 a)] 

n 


=  l-x,  E  log 

n  j=i 


PjW 

Pj(0*) 


+  hi-x3)i£\og 

n  i=i 


1  -  Pj{9) 

1  -  Pj($ a) 


Since  Xj  =  0  or  1,  and  Pj(0)  is  strictly  increasing  in  9 ,  then  for  0  >  A, 


sup  n~l[Ln(0)  -  Ln(9^)} 

|9|>A 


<  supn 
e>A 


n 


Zlos 


PA*V 
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and  for  9  <  —A, 


sup  n  l[Ln{6)  -  I„(0a)]  <  sup  n  1  log  --1 

|*|>A  9<—A  J=1  1  -  -fj(-A) 

Since  each  item  response  function  has  horizontal  asymptotes  as  9  — *  -f  oo  and  9  — * 
— oo,  we  can  prove  that 


lim  sup  n 

n-.oo  0>A 


H10g 

i=i 


Pi(  A) 


0 


and 


as  A 


n 


lim  sup  n 
n— *°°  0<—A 


Zlo§ 


1  ~  /W 
l-^(-A) 


oo.  Therefore  we  have 


0 


lim  sup  n  x[Ln(9)  —  Ln{9X)\  — ►  0  as  A  — ►  oo.  (44) 

n~*00  |fl|>A 

Substituting  0  a  for  0  in  (33),  we  have 

P{ Jirn  n-1[Z„(0A)  -  Ln(90)\  <  ~cA}  =  1.  (45) 

Formulas  (44)  and  (45)  can  be  used  to  (43)  to  get  (42).  Therefore  (42)  holds.  ■ 


Proof  of  Corollary  3.1:  The  MLE ,  if  it  exists,  obviously  satisfies 

Pn(Xu...,Xn\9n). 


Ln{9n)  -  Ln(90)  =  \og{- 


-}  >0 


(46) 


P„(  XU...,XM 

for  all  n  and  for  all  X\, . . . ,  Xn.  It  is  sufficient  to  prove  that  for  any  e  >  0  and  6  >  0, 
there  exists  N(e,6)  such  that 


Prob{\9n  -  0O|  <  <5}  >  1  —  e  for  all  n  >  N(e,6). 


Suppose  0„  is  not  consistent,  then  there  exist  Co  and  such  that,  for  any  N  there 
exists  some  n  >  N, 

Prob{\9n  -  0O|  >  <§o}  >  fo- 
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Therefore  we  can  obtain  a  subsequence  {0n,}  such  that 

Prob{\0ni  -0O|  >  <5o}  >  c0  for  all  n,.  (47) 

Thus, 

c0  <  Jirn  Prob{\0n  -  0O|  >  60}  <  Proft^lirn  [|0n  -  0O\  >  <50]}- 
It  is  obvious  that  the  event 

lim  [|0„  -  0Q\  >  60] 

n— *00  '  J 

implies  that  for  infinitely  many  n 

sup  [Ln(0)  —  Ln(0n)\  >  0  for  infinitely  many  n, 

\6~  $o  |>6o 

because  0  =  0n  is  a  possible  value.  But  then  according  to  (46)  the  event 

sup  [Z,„(0)  —  £„(0o)]  >  0  for  infinitely  many  n 
|0  — 

has  a  probability  greater  than  or  equal  to  c0.  This  contradicts  (24),  which  implies 
that  for  any  c  >  0,  there  exists  N  such  that 

Prob{  sup  [Ln(6)  —  Ln(0 0)]  >  0}  <  c  for  all  n  >  N. 

[0— #o 

This  completes  the  proof.  ■ 

Proof  of  Lemma  3.2:  Without  loss  of  generality,  we  first  consider  that  0n  £ 
[|0  —  0O|  <  6]  C  No.  Since  the  0n  is  consistent,  the  probability  of  0n  being  con¬ 
tained  in  the  neighborhood  of  0o  will  be  close  to  one,  when  n  is  sufficiently  large. 

The  second  derivative  of  the  log  likelihood  function  can  be  written  as 

=  £>;>)[*,- />(*)]  -  £/>(*).  (48) 

j=i  j= 1 
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To  prove  (48),  first  notice  that  it  suffices  to  prove  for  n= 1,  that  is 

i"(0)  =  A;m[A-i-A(0)]  -/,(#).  (49) 

Note  that 

L1(0)  =  A'(0)X1+log(l-P1(<?)), 

so  that 

L'[{0)  =  +  [log(l-P1(0))]". 

Comparing  this  with  (49)  it  remains  to  show  that 

-  [log(l  -  P,(0))f  =  A;>)P,(«)  +  /,(»).  (50) 

However  by  definition, 

/.(«)  =  Eh[-L'[(0))  -  -A;'(«)P,(9)  -  [log(l  -  P,W)f, 
which  is  equivalent  to  (50). 


Consider  the  numerator  of  |i?„|  : 

Km + /,n)mi  =  i  da"to  -  AjtMPo  -  paw + £  *>o)po  - 


j=i 


+  E  A"(«o)(P,(*o)  -  P,TO1  +  EU(«,)  -  4m)l 

J=1  J=1 


<  EKm-^'wi 

>=1 

+  i£a;«wpo-p,«w]I 

>=i 

+  lX>"(W,(0o)-p,m]l 


(51) 


J=1 

n 


+  £  |/j(0n)  -  />(«;)|. 

3  =  1 
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Note  that  0 *  depends  on  6  and  0n  through  the  Taylor  expansion  and  that  the  distri¬ 
bution  of  0n  depends  on  6q.  From  (37) 

I E  a;(*o)[^(«o)  -  Pj(«;)]|  <  \K  -  «o Mr.  (52) 

J=1 

From  the  mean  value  theorem 


-  9„)| 

and 

IW.)  -  h(K)  I  =  -  <01. 

where  6^'^  is  a  point  between  6‘n  and  Qq,  and  6^'^  is  a  point  between  0n  and  0‘. 
According  to  assumption  (A4),  the  third  derivative  of  the  logit  function,  A ”’(0),  and 
the  first  derivative  of  the  information  function,  7^(0),  are  bounded  in  absolute  value 
uniformly  in  j  and  in  9,  therefore, 

EI^;)-^o)|<K;-^oKa,  (53) 

J=l 


Z\Wn)-ii(0:)\<\en- e:\nCi.  (54) 

J=1 

Note  that  (p,  £*,  and  (/  are  finite  positive  numbers  and  they  are  independent  of  j. 


We  shall  now  prove 


|£A"(MAW,(«o))l  =  0,(nI/I). 


(See  Footnote  4.)  Assumption  (A4)  ensures  that  {A^’(#o)}  is  bounded  in  absolute 
value  uniformly  in  j.  By  Chebyshev’s  inequality,  for  some  M  >  0, 

^(lEA"(«o)[^-/’J(«o)!l>n‘/JA')  <  <  MK-\ 

j=i  nA 

4 The  notation  of  a„  =  Op(bn)  means  that  a„  is  bounded  stochasticly  by  b„  in  probability,  that 
is,  a„  =  Op(b„)  if  and  only  if  for  arbitrary  t  >  0  there  exist  Me  and  Nt  such  that 

P{|an/6„|  <  M()  >  1  —  e  for  all  n  >  Nt. 
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that  is,  for  arbitrary  e  >  0,  take  K  =  (M/e)1/2,  then  we  have 

^{lEA;'(^o)[XJ-PJ(5o)]/n1/2|  <K)  >  1  — e  for  all  n 

3= 1 

that  means  we  have  (55). 

Formulas  (52),  (53),  (54),  and  (55)  can  be  applied  to  (51)  to  get 

i  Km  +  /‘"’(M  <  (i«;  -  «»i + 1».  -  «;i}»c + (56) 

where 

C  =  Cp  +  Ca  +  0- 

We  shall  now  prove 

Jim  P{  I{n)(6n)/n  >  c/2  >  0}  =  1.  (57) 

By  assumption  (A4) 

n- Vn>(*n)  - /(n)(0o)|  <  n-1f:|/J(0n)-/JWI 

<  \8n  -  0o\C /•  (58) 

By  using  the  consistency  of  0n  and  (58),  we  get 

1^(0 n)/n  —  I^n\0o)/n  — ►  0  in  Pg0  as  n  — ►  oo. 

Thus,  by  assumption  (A5),  we  have  (57). 


From  (56)  and  (57)  we  obtain 


sup  Xu...,Xn)\ 

\e-o0\<6 


< 


sup 


sup 

\8-$o\<6 


f(|g:-goH-fe-g;i)nc 

l  /<">(*») 

f(i*;-*oi  +  &-gsi)nc 

1  /<">(*n) 
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Note  that 


\K  -  en\  <  |e;  -  e0\  +  \dn  -  e0\  and  \pn  -  e0\  <\e-  e0\  +  \k  -  0O|, 


where  the  second  inequality  follows  from  the  fact  that  0’  is  between  6  and  0n.  There¬ 
fore 


sup  1^(0,  A'i,...,An)|  <  sup 
|fl-e0|<«  \e-e0\<s 


(3|0n  -  0q |  +  2\6  -  flol)C 

n 


+  Op{n-1'2). 


For  any  e  >  0,  choose 


then  we  have  (23),  recalling  that  0„  — ►  0O  in  Pg0  and  (36). 


The  above  proof  is  based  on  the  assumption  that  0„  is  in  the  neighborhood  (0O  — 
0,0O  +  0),  so  we  just  proved  that  the  conditional  probability  approaches  to  one: 


where 


and 


lim  P[Un\Vn)  =  1, 


Un  =  {  sup  1/^(0,  Xu...,Xn)\<t} 
\e-e0\<6 


(59) 


vn  H  («„  g  [|e  -  0„|  <  «)  C  No). 

Since  Corollary  3.1  implies 

lim  P[Vn]  =  1,  (60) 

n— *oo 

it  is  obvious  that  (59)  and  (60)  implies  limT,_0o  P[Un]  =  1-  Thus  we  finish  the  proof. 


Proof  of  Theorem  3.1: 

Remark:  The  following  proof  will  use  a  similar  methodology  as  Walker’s(1969).  The 
proof  itself  will  not  use  any  assumption  about  i.i.d..  Instead,  it  will  just  depend  on 


32 


the  results  of  Lemma  3.1  and  Lemma  3.2. 


As  we  discussed  in  section  3.1,  it  suffices  to  prove  (13)  and  (14).  To  prove  (13)  it 
suffices  to  prove  (20)  and  (21).  Let  us  start  with  (20).  Rewrite  G\  as 

G\  =  Pn(Xl . Xn\en)  f  n (6)exp{Ln(Q)  -  Ln(dn)}d0 

J\0-6o\>S 

=  Pn(Xu...,Xn\9n)exp{Ln(6o)-Ln(0n)}  [  n (6)exp{Ln(0)-Ln(6o)}dO. 

J\0—Oo  |>£ 

Since  6n  is  an  MLE, 

Ln(d 0)  -  Ln(0n)  <  0,  (61) 


and  therefore  exp{Z,n(0o)  -  Ln(0n)}  <  1.  So  we  have 


KTxJ'xj^  -  -  *■<*■»* 

=  exp{L„(0o)  -  i„{9„)}{/("l(«n)}1/J  f  n(«)exp {!„(«)  -  £„(«„)}<» 


<  {/|n|(«n)}1/;!G1>, 


where 

G0=/  n(9)exp{Ln{6)- Ln(6o)}d0. 

J\0-0o 

By  Lemma  3.1,  for  any  6  >  0,  there  exists  k(6)  >  0  such  that 


where 


Define 


notice  that 


Jim  Pff0{^n}  =  1, 

Un  =  [  sup  n~l[Ln{0)  -  Ln(0o)]  <  -k(6)  <  0]. 
\0-6o\>6 

Vn  =  [G0  <  exp{— nfc(£)}]; 
exp{— nfc(£)}  /  II (0)dO  <  exp{— nfc(6)}. 


(63) 

(64) 


Because  Un  Q  Vn,  we  have 


lim  Pe0{Go  <  exp{— nA'((5)}}  =  1. 


Since 


{/(n)(^n)}1/2exp{-n&(<5)}  — ♦  0  in  Pg0,  as  n 
it  follows,  (using  (62)) 

Gt 


0, 


lim  - 

n-’°°  Pn( 


=  0  in  Pg0. 


Thus  (20)  holds. 


(65) 


Now  we  prove  (21).  From  (15),  rewrite  G2  as 
C2  =  Pn(X,.  ,,Xn\ 9n)  I  U(9)exp{Ln(9)-Ln(9n)}d9 

J\d—6o  |<6 

-9 

2d  l 


=  rn(xu...,x„\e„)  [  me)exf{-fXr^-(i-R„)}M 

j\0~8a\<6  lai 


=  K(X, . X„\L)n(0„ )( 

J  \0— o 


l\e-e0\<6  n(0o) 


2d2 


We  shall  now  observe 


Gi 


P„(  X\  ,...,Xn\8n)&n  ’ 

g2  n(0o) 


=  ^1/  -H^-exp{-(<?  — (1-Rn))d9  (66) 

Pn(Xu...,Xn\9n)&n  °n  J\e-e0\<SU(90)  2<r* 

From  condition  (Al),  in  particular  the  continuouity  of  11(0),  for  any  e  >  0  we  can 
choose  6  such  that  {9  :  \9  —  90 \  <  6}  C  No  and 


.  ,  11(0)  11(0) 

1  -  c  <  inf  <  sup  r—  -  <  1  +  e. 

\e-e0\<s  n(0o)  |e-e0|<4  n(0o) 


(67) 


Then,  using  (66) 

(1  —  c)n(0o) 


G3  < 


Pn(  X,,...,Xn|0n)<7n 


<  (i  +  c)n(g0)c^ 


(68) 
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where 


G3=  [  exp{-^-M(l  - 
J\0-8o\<6 


For  any  c  >  0,  define 


cn  =  [  sup  I Rnie,  Xu...,Xn)\<e), 

\8-80\  <6 


>»  =  [  f  ,  exp{-^— ^r(l  +  £)}d«  <  c3  <  [  «pl-^(l  +  e)}M] 

J\0—0o\<S  2<T‘  J\B-$o\<6  2o* 

(71) 

Now  we  should  get  rid  of  Rn.  Since  Cn  C  D„,  and  for  any  e  >  0,  from  Lemma 


lim  PgQ{Cn}  =  1,  this  implies  lim  PBo{Dn}  =  1. 


That  is,  the  probability  of  the  event 


/  exp{-~-^~{\ ->r  t)}d0  <  <  j  exp{-^-rr—  (1  -  t)}d0  (72) 

J\e-e0\<6  2  crl  J\B-e0\<6  20* 

converges  to  1  as  n  — ►  oo.  Therefore,  recalling  (17), (65), (68),  and  (69),  the  only 
thing  left  to  establish  (13)  is  to  observe  that 

/  exp{- —  -  4"  ~(1  +  f)}M 

J\6-80\<6  2cr* 

=  (2tt)1/2(1  +  e')-x/2an[^{a-\e 0  +  <5  -  0n)(l  +  e‘)1/2}-${^1(fl0-^-^n)(l+c*)1/2}], 

(73) 

where  e‘  =  e  or  —  e.  Since  0n  is  consistent  and  d"1  — ►  oo  in  probability,  when  e  <  1, 


0O  +  6  -  9n  -*  6  in  PBo, 


60  -  6  —  0n  —>  -6  in  P$0 , 

K\9o  +  <5  -  0„)(1  +  c‘)1/2  -♦  oo  in  PBo, 
K'Wo  ~  h  -  0„)(1  +  e")1/2  -►  -oo  in  PSo. 

35 


So 


+  +e*)1/2}  -  1  inP8o , 

${<Tn-1(^o-^-^n)(l  +  e*)1/2}  -0  inPeo. 

Therefore,  the  difference  in  the  square  brackets  of  (73)  converges  to  unity  in  proba¬ 
bility.  Since  the  t  is  arbitrary,  this  proves  (13). 


Now  we  prove  (14).  First  of  all  we  consider  (12)  and  (17)  again:  G  and  G2  are 
the  same  except  for  their  rigions  of  integration:  one  is  ( 0n  +  abn,  0n  +  ban )  and  the 
other  is{0  :  1 0  —  0O|  <  <!>}•  For  the  same  t  and  6  given  by  (67),  if  ( 0n  +  acrn ,  0n  +  ban ) 
is  a  subset  of  {0  :  \0  —  $o|  <  <5}t  we  must  have 


1  -  c  < 


inf 


U(0) 


< 


sup 


m 


(6„+adn,  9„+6ct„)  n(0O)  (fln+offn,  ^(^o) 


<  1  +£. 


Define 

En  =  [{0n  +  aan ,  0n  +  ban)  C  {0  :  \9  -  0O\  <  6}]. 
Since  0n  — *■  0q  in  P6o  and  bn  — *■  0  in  Pg0 .  Thus, 


(74) 


Pe0{En)  -►  1  as  n  —*  00,  (75) 

and  hence  the  probability  of  (74)  converges  to  1  as  n  — *  00.  Consider  (68)  again.  If 
(0n  +  aan,  0n  4-  ban)  is  a  subset  of  {0  :  \0  -  9q\  <  <5},  and  if  we  substitute  the  rigions 
of  integration  of  (68)  by  (0n  +  a< crn,  0n  +  bbn),  then  the  new  inequality  (76)  below  will 
still  hold. 


(i-om) 


< 


where 


Pn{  xu...,xn\ 0n)&„ 

«n+a<7„  (0  _  0n)2 


JOn+bdn 


(76) 


(77) 
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Because  of  (75),  the  probability  of  the  event  indicated  by  (76)  converges  to  1  as 
n  — *  oo.  For  the  same  e  given  by  (72)  define 


Cn  =  [  sup  \Rn(0,  X„...,Xn)|<e], 

(0„+a<7n,  fln+i>5n) 


(78) 


and 


(0  _  0n)2 


2  52 


ron+aan  «  _  H  ,  rUn+<Tna 

\  .  exP{~  ~9~~'2  (1  +  £)}<W  <G3<  l  exp{ 

^«'0n-4'6£n  n  *'^n+^n^ 

From  (75)  and  En  C  C'  C  D'n, 

P6o{Dn}  — >  1  as  n  — ►  oo. 

Similar  to  (73),  now  we  shall  estimate 

r6„+adn  (()  —  §)? 

L  exP^-^rL(1  +  c‘)» 

where  e“  =  t  or  —  t.  It  is  obvious  that  the  quantity  in  (80)  is  equal  to 

(2ir)1/2<Ml  +  C)'‘/2i*f«(l  +  -  *W  +  e‘)1/J}]- 


(1  -e))d0 
(79) 


(80) 


Since  we  can  make  t  arbitrarily  small,  therefore,  using  (76)  and  (77)  we  can  finally 
obtain 

6  (27r)1/2n(0o){^(fl)-^(6)} 


Pn{  A'i,...,Xn| 6n)an 
in  probability  Pg0. 


B  The  Proof  of  Strong  Convergence 

The  proof  of  Theorem  3.2  is  analogous  to  that  of  Theorem  3.1  and  is  also  based 
on  two  lemmas  and  one  corollary.  However,  these  intermediate  results  are  stronger 
than  those  used  in  proving  Theorem  3.1. 
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Lemma  B.l  Under  the  assumptions  of  Lemma  3.1,  for  any  given  8  >  0,  there  exists 
k(8)  >  0  such  that 

P,9o{lim  SUP  n_1[Z,n(0)  -  Ln(0o)]  <  -k(8)}  =  1.  (81) 

\e-e0\ >6 

Proof:  The  proof  of  (81)  analogous  to  that  of  Lemma  3.1  except  the  following  two 
changes: 

(1)  replacing  (39)  by 

P$0{  lim  sup  n~x[Ln{6)  -  Ln(90)]  <  -c,}  =  1;  (82) 

n-'°°  |0-0,|<« 

(2)  replacing  (41)  by 

A\n)  =  {  lim  sup  n'x[Ln(0)  -  Zn(0o)]  <  -c,}. 

n-°°|0-<M<5 

Now  we  only  need  to  prove  (82).  Since 

Jimn-1[Z,n(0,)  -  L„(0O)] 
is  measureable  with  respect  to  the  tail  a  field 

<7(Zn(Qi),  Zn+l(0i),  ....), 

by  the  Kolmogorov’s  0  —  1  law  (Billingsley,  p295)  it  must  be  a  “nonrandom” 
constant  with  probability  1.  Denote  this  constant  as  tj.  According  to  (40), 

P60{v  =  jimn_1[Ln(0,)  -  L„(0O)]  <  -c(0.)  <  0}  =  1. 

Choose 

c(0j)  ~  V 

C  2 

and  choose  8  small  enough  such  that 

lim  n-1  V  Hj{8,0i)  <  e, 

n— *oo  - ' 

J  =  1 
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(see  (34)  for  the  definition  of  Hj(6,9i)),  thus 

lim  sup  n-1[Ln(0)  -  Ln(90)\  <  lim  n-1  [Ln(0.)  -  Ln(90)\  +  lim  n'1  V  Hj(S,  0$ 
n~',x‘ \$-e>\<6  n~*°°  n-,0° 

<  T)  +  c  <  —  c(0i)  almost  surely. 

Thus  (82)  holds.  ■ 

Corollary  B.l  Lemma  B.l  ensures  that 

P$0{  lim  9n  =  Oq)  =  1. 

n— oo  * 

Proof:  Analogous  to  that  of  Wald  (1949)  and  omitted.  ■ 

Lemma  B.2  Under  the  assumptions  of  Lemma  3.2,  for  any  e  >  0,  there  exists  8 
such  that 

Pe0{  lim  sup  \Rn(  Xu . . . ,  Xn,  0)|  <  e}  =  1.  (83) 

"-*00  |«-«o|<6 

Proof:  Analogous  to  that  of  Lemma  3.2  and  omitted.  S 

Proof  of  Theorem  3.2:  Based  on  Lemma  B.l,  Lemma  B.2  and  Corrollary  B.l.  The 
basic  steps  are  analogous  to  those  of  Theorem  3.1  and  omitted.  ■ 


C  The  Proof  of  Convergence  in  Manifest  Proba¬ 
bility 

Proof  of  Theorem  3.3:  Theorem  3.1  implies  that  for  arbitrary  9  and  arbitrary 

c  >  0, 

Pe{\An(  -i4|>e}-0, 
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as  n  — >  oo.  Define 


Hn(0,€)  =  Po{\An(Xu...,Xn)-A\>t} 

It  is  clear  that  for  any  0  and  t  >  0  that 

0  <  Hn(0,e)  <  1  and  lim  Hn(0,e)  =  0. 

n— mx> 

By  Lebesgue’s  bounded  convergence  theorem  (Billingsley,  p214), 

f  Hn(0,  e)U(0)dO  ->  0. 

JQ 

That  is, 

PUAniXi^.^XJ-Al^e}  =  [  P{\An(Xu...,Xn)-A\>e\0}U(0)d0 

Je 

=  f  Hn{0,e)U(0)d0  0. 

Je 

This  proves  Theorem  3.3.  ■ 
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